21 research outputs found

    Component-based Attention for Large-scale Trademark Retrieval

    Full text link
    The demand for large-scale trademark retrieval (TR) systems has significantly increased to combat the rise in international trademark infringement. Unfortunately, the ranking accuracy of current approaches using either hand-crafted or pre-trained deep convolution neural network (DCNN) features is inadequate for large-scale deployments. We show in this paper that the ranking accuracy of TR systems can be significantly improved by incorporating hard and soft attention mechanisms, which direct attention to critical information such as figurative elements and reduce attention given to distracting and uninformative elements such as text and background. Our proposed approach achieves state-of-the-art results on a challenging large-scale trademark dataset.Comment: Fix typos related to authors' informatio

    MTRNet: A Generic Scene Text Eraser

    Full text link
    Text removal algorithms have been proposed for uni-lingual scripts with regular shapes and layouts. However, to the best of our knowledge, a generic text removal method which is able to remove all or user-specified text regions regardless of font, script, language or shape is not available. Developing such a generic text eraser for real scenes is a challenging task, since it inherits all the challenges of multi-lingual and curved text detection and inpainting. To fill this gap, we propose a mask-based text removal network (MTRNet). MTRNet is a conditional adversarial generative network (cGAN) with an auxiliary mask. The introduced auxiliary mask not only makes the cGAN a generic text eraser, but also enables stable training and early convergence on a challenging large-scale synthetic dataset, initially proposed for text detection in real scenes. What's more, MTRNet achieves state-of-the-art results on several real-world datasets including ICDAR 2013, ICDAR 2017 MLT, and CTW1500, without being explicitly trained on this data, outperforming previous state-of-the-art methods trained directly on these datasets.Comment: Presented at ICDAR2019 Conferenc

    Learning Test-time Data Augmentation for Image Retrieval with Reinforcement Learning

    Full text link
    Off-the-shelf convolutional neural network features achieve outstanding results in many image retrieval tasks. However, their invariance is pre-defined by the network architecture and training data. Existing image retrieval approaches require fine-tuning or modification of the pre-trained networks to adapt to the variations in the target data. In contrast, our method enhances the invariance of off-the-shelf features by aggregating features extracted from images augmented with learned test-time augmentations. The optimal ensemble of test-time augmentations is learned automatically through reinforcement learning. Our training is time and resources efficient, and learns a diverse test-time augmentations. Experiment results on trademark retrieval (METU trademark dataset) and landmark retrieval (Oxford5k and Paris6k scene datasets) tasks show the learned ensemble of transformations is effective and transferable. We also achieve state-of-the-art MAP@100 results on the METU trademark dataset

    Towards Self-Explainability of Deep Neural Networks with Heatmap Captioning and Large-Language Models

    Full text link
    Heatmaps are widely used to interpret deep neural networks, particularly for computer vision tasks, and the heatmap-based explainable AI (XAI) techniques are a well-researched topic. However, most studies concentrate on enhancing the quality of the generated heatmap or discovering alternate heatmap generation techniques, and little effort has been devoted to making heatmap-based XAI automatic, interactive, scalable, and accessible. To address this gap, we propose a framework that includes two modules: (1) context modelling and (2) reasoning. We proposed a template-based image captioning approach for context modelling to create text-based contextual information from the heatmap and input data. The reasoning module leverages a large language model to provide explanations in combination with specialised knowledge. Our qualitative experiments demonstrate the effectiveness of our framework and heatmap captioning approach. The code for the proposed template-based heatmap captioning approach will be publicly available

    A facile solid-state heating method for preparation of poly(3,4-ethelenedioxythiophene)/ZnO nanocomposite and photocatalytic activity

    Get PDF
    Poly(3,4-ethylenedioxythiophene)/zinc oxide (PEDOT/ZnO) nanocomposites were prepared by a simple solid-state heating method, in which the content of ZnO was varied from 10 to 20 wt%. The structure and morphology of the composites were characterized by Fourier transform infrared (FTIR) spectroscopy, ultraviolet-visible (UV-vis) absorption spectroscopy, X-ray diffraction (XRD), and transmission electron microscopy (TEM). The photocatalytic activities of the composites were investigated by the degradation of methylene blue (MB) dye in aqueous medium under UV light and natural sunlight irradiation. The FTIR, UV-vis, and XRD results showed that the composites were successfully synthesized, and there was a strong interaction between PEDOT and nano-ZnO. The TEM results suggested that the composites were a mixture of shale-like PEDOT and less aggregated nano-ZnO. The photocatalytic activity results indicated that the incorporation of ZnO nanoparticles in composites can enhance the photocatalytic efficiency of the composites under both UV light and natural sunlight irradiation, and the highest photocatalytic efficiency under UV light (98.7%) and natural sunlight (96.6%) after 5 h occurred in the PEDOT/15wt%ZnO nanocomposite

    MTRNet++: One-stage Mask-based Scene Text Eraser

    Full text link
    A precise, controllable, interpretable and easily trainable text removal approach is necessary for both user-specific and large-scale text removal applications. To achieve this, we propose a one-stage mask-based text inpainting network, MTRNet++. It has a novel architecture that includes mask-refine, coarse-inpainting and fine-inpainting branches, and attention blocks. With this architecture, MTRNet++ can remove text either with or without an external mask. It achieves state-of-the-art results on both the Oxford and SCUT datasets without using external ground-truth masks. The results of ablation studies demonstrate that the proposed multi-branch architecture with attention blocks is effective and essential. It also demonstrates controllability and interpretability.Comment: This paper is under CVIU review (after major revision

    Missing ingredients in optimising large-scale image retrieval with deep features

    No full text
    This thesis applies advanced image processing and deep machine learning techniques to solve the challenges of large-scale image retrieval. Solutions are provided to overcome key obstacles in real-world large-scale image retrieval applications by introducing unique methods for making deep learning systems more reliable and efficient. The outcome of the research is useful for several image retrieval applications including patent search, and trademark and logo infringement analysis

    METU dataset: A big dataset for benchmarking trademark retrieval

    No full text
    Trademark retrieval (TR) is the problem of retrieving similar trademarks (logos) for a query, and the main aim is to detect copyright infringements in trademarks. Since there are millions of companies worldwide, automatically retrieving similar trademarks has become an important problem, and currently, checking trademark infringements is mostly performed manually by humans. However, although there have been many attempts for automated TR, as also acknowledged in the community, the problem is largely unsolved. One of the main reasons for that is the unavailability of a publicly available comprehensive dataset that includes the various challenges of the TR problem. In this article, we propose and introduce a large dataset composed of more than 930,000 trademarks, and evaluate the existing approaches in the literature on this dataset. We show that the existing methods are far from being useful in such a challenging dataset, and we hope that the dataset can facilitate the development of better methods to make progress in the performance of trademark retrieval systems

    Noisy Uyghur Text Normalization

    No full text
    Uyghur is the second largest and most actively used social media language in China. However, a non-negligible part of Uyghur text appearing in social media is unsystematically written with the Latin alphabet, and it continues to increase in size. Uyghur text in this format is incomprehensible and ambiguous even to native Uyghur speakers. In addition, Uyghur texts in this form lack the potential for any kind of advancement for the NLP tasks related to the Uyghur language. Restoring and preventing noisy Uyghur text written with unsystematic Latin alphabets will be essential to the protection of Uyghur language and improving the accuracy of Uyghur NLP tasks. To this purpose, in this work we propose and compare the noisy channel model and the neural encoderdecoder model as normalizing methods. </p

    A Large-scale Dataset and Benchmark for Similar Trademark Retrieval

    Get PDF
    Trademark retrieval (TR) has become an important yet challenging problem due to an ever increasing trend in trademark applications and infringement incidents. There have been many promising attempts for the TR problem, which, however, fell impracticable since they were evaluated with limited and mostly trivial datasets. In this paper, we provide a large-scale dataset with benchmark queries with which different TR approaches can be evaluated systematically. Moreover, we provide a baseline on this benchmark using the widely-used methods applied to TR in the literature. Furthermore, we identify and correct two important issues in TR approaches that were not addressed before: reversal of contrast, and presence of irrelevant text in trademarks severely affect the TR methods. Lastly, we applied deep learning, namely, several popular Convolutional Neural Network models, to the TR problem. To the best of the authors, this is the first attempt to do so
    corecore